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Exploratory Data Analysis 


by Shaig E.Kazimov 


According to the data available from the United Nations, Azerbaijan has experienced steady 
population growth in the last 20 years. From the year 2000 to 2020, the population of Azerbaijan 
has increased from around 7.7 million to over 10 million people. This population growth has been 
driven by both natural increase and net migration. 


In recent years, Azerbaijan has also experienced changes in its age structure, with a growing 
number of elderly people and a declining number of young people. This demographic shift is 
largely due to improvements in health care and declining fertility rates. 


Additionally, Azerbaijan has experienced urbanization in recent years, with a growing proportion 
of the population living in urban areas. This trend is expected to continue in the coming years, 
driven by economic and employment opportunities in urban centers. 


Overall, the demographic situation in Azerbaijan over the last 20 years has been characterized by 
steady population growth, aging of the population, and urbanization. These trends are expected 
to continue in the coming years and will likely have significant implications for the country's social, 
economic, and political development. 


Importing the necessary libraries: pandas, numpy, matplotlib, and 
statsmodels. 


Bsog [1]: import pandas as pd 

import numpy as np 

import matplotlib.pyplot as plt 

from statsmodels.tsa.holtwinters import ExponentialSmoothing 
import warnings 

warnings. filterwarnings('ignore') 
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Load the population data for Azerbaijan into a pandas dataframe 


# Load the dataset 
data = pd.read_excel ('Azerbaijan_population.xlsx') 


data.head () 


Year Population 


0 1923 1863.0 


1 1924 2128.7 
2 1925 2162.9 
3 1926 2314.6 
4 1927 2366.0 


data["Year"] = pd.to datetime (data["Year"], format="%Y") 
data = data.set_index ("Year") 


# Plot the original time series data to visualize any trends or seasonal 
plt.figure (figsize=(10,5)) 
plt.plot (data) 
plt.xlabel ("Year") 
plt.ylabel ("Population") 
plt.title ("Azerbaijan Population") 
plt.show() 
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data. diff (periods=1) 


= data_diff[1:] 


Bsog [28]: # Let's plot the differenced data to see if it's stationary 


plt.figure(figsize=(10,5)) 
plt.plot (data diff) 
plt.xlabel ("Year") 
plt.ylabel ("Population") 
plit.title("Differenced Azerbaijan Population") 
plt.show() 
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Bsog [29]: # The differenced data appears to be stationary, so let's fit the ARIMA | 


= ARIMA (data, order=(1,1,0)) 


TLC 


= model.fit() 
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# Let's print a summary of the model 
print (model _ fit.summary() ) 


SARIMAX Results 


Dep. Variable: Population No. Observations: 
100 
Model: ARIMA(1, 1, 0) Log Likelihood 
-524.553 
Date: Thu, 09 Feb 2023 AIC 
1053.106 
Time: 10:08:54 BIC 
1058.296 
Sample: 01-01-1923 HQIC 
1055.206 
= 01-01-2022 

Covariance Type: opg 

coef std err Z P>|z | [0.025 
0.975] 
ar.Ll 0.9078 0.030 30.180 0.000 0.849 
0.967 
sigma2 2320.5938 169.889 13.659 0.000 1987.618 
2653.570 
Ljung-Box (L1) (Q): Su 1D Jarque-Bera (JB): 
380.94 
Prob(Q): 0.02 Prob (JB): 
0.00 
Heteroskedasticity (H): 0.06 Skew: 
-1.98 
Prob(H) (two-sided): 0.00 Kurtosis: 
I. 75 
Warnings: 


[1] Covariance matrix calculated using the outer product of gradients 
(complex-step). 


Dep. Variable: The dependent variable of the model, in this case "Population". 
No. Observations: The number of observations used in the model, which is 100 in this case. 


Model: The type of model used, which is ARIMA(1, 1, 1). The three numbers in the parentheses 
represent the order of the AR term, the order of the differencing term, and the order of the MA 
term respectively. 


Log Likelihood: The log likelihood of the fitted model. This measures the goodness of fit of the 
model, with higher values indicating a better fit. 
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AIC and BIC: These are two measures of model complexity, with lower values indicating a simpler 
model. 


Sample: The sample range of the observations used in the model, in this case O to 100. 
Covariance Type: The type of covariance matrix used in the estimation. 


Coefficients: The estimated coefficients of the AR and MA terms. The "std err" column gives the 
standard error of each estimate. 


z and P>|z|: The z-statistic and associated p-value for each coefficient. If the p-value is below a 
certain significance level (usually 0.05), we reject the null hypothesis that the corresponding 
coefficient is equal to 0, indicating that the coefficient is statistically significant. 


Confidence Interval: The 95% confidence interval for each coefficient estimate. 


Ljung-Box (L1) (Q): The Ljung-Box test statistic for testing the residuals of the model for 
autocorrelation. 


Jarque-Bera (JB): The Jarque-Bera test statistic for testing the residuals for normality. 
Prob(Q) and Prob(JB): The p-values for the Ljung-Box and Jarque-Bera tests respectively. 


Heteroskedasticity (H): The test statistic for testing the residuals for heteroskedasticity (non- 
constant variance). 


Prob(H) (two-sided): The p-value for the heteroskedasticity test. 
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BBog [31]: # Get the residuals of the ARIMA model 
residuals = model fit.resid 


# Plot the residuals 
plt.figure(figsize=(10,5)) 
plt.plot (residuals, label='Residuals') 
plt.xlabel ("Year") 
plt.ylabel ("Residuals") 
plt.title ("ARIMA Model Residuals") 
plt.legend() 
plt.show() 
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Boñ [47]: # Make predictions for the next 10 years (2023 to 2031) 
future years = np.arange (2024, 2031) 
future predictions = model fit.forecast(steps = 9) [0:] 


Beon [48]: print (future predictions) 


2023-01-01 10190.259123 
2024-01-01 10220.994786 
2025-01-01 10248.895125 
2026-01-01 10274.221693 
2027-01-01 10297.211918 
2028-01-01 10318.081324 
2029-01-01 10337.025555 
2030-01-01 10354.222205 
2031-01-01 10369.832487 


Freq: AS-JAN, Nam 


predicted mean, dtype: floato4 


Bsog [81]: import seaborn as sns 
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Beon [93]: df = pd.DataFrame (future predictions) 
df.reset_index(inplace = True) 


df.head () 
Out [93]: index predicted_mean 
0 2023-01-01 10190.259123 
1 2024-01-01 10220.994786 
2 2025-01-01 10248.895125 
3 2026-01-01 10274.221693 
4 2027-01-01 10297.211918 
BBpog [97]: plt.figure(figsize = [12,6]) 
sns.lineplot(x = 'index', y = 'predicted_ mean', data=df, color='blue') 
plt.xlabel ("Date") 
plt.ylabel ("Value") 
plt.title ("ARIMA Model Future Predictions") 
plt.xticks (rotation=90) 
plt.show() 
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The time series data is split into training and testing sets, with the last 
10 years of data being set aside for testing. 


BBon [49]: # Split the data into training and testing sets 
train = data[:-10] 
test = data[-10:] 


The Exponential Smoothing model is fit to the training data, using a 
seasonal period of 10, a trend of "add", and a seasonal component of 
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= ExponentialSmoothing (train, 
= model exp.fit() 


seasonal periods=10, 
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Beon [63]: print (model _exp.summary () ) 
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ExponentialSmoothing Model Results 
Dep. Variable: Population No. Observations: 
90 
Model: ExponentialSmoothing SSE 
535370.354 
Optimized: True AIC 
810.181 
Trend: Additive BIC 
845.179 
Seasonal: Additive AICC 
817.633 
Seasonal Periods: 10 Date: Thu, 0 
9 Feb 2023 
Box-Cox: False Time: 
11:41:51 
Box-Cox Coeff.: None 

coeff code opti 

mized 
smoothing level 0.7959876 alpha 
True 
smoothing trend 0.1302668 beta 
True 
smoothing seasonal 0.0521223 gamma 
True 
initial level 2178.5919 2:0 
True 
initial trend 34.547983 b.0 
True 
initial seasons.0 -59.132334 s.0 
True 
initial seasons.1 -44.514305 Sol 
True 
initial_seasons.2 -47.734239 S 
True 
initial _seasons.3 -18.598575 S23 
True 
initial _seasons.4 -7.4917421 s.4 
True 
initial seasons.5 -6.7524900 s.5 
True 
initial seasons.6 -4.4845939 s.6 
True 
initial seasons.7 -1.3801127 s.7 
True 
initial seasons.8 2.0152882 s.8 
True 
initial _seasons.9 -6.4063965 S29 
True 
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The model has been fit to a time series with 90 observations, where the dependent variable is 
"Population." The SSE (Sum of Squared Errors) is 535370.354 and the AIC (Akaike Information 
Criterion) is 810.181. 


The values of the smoothing parameters (alpha, beta, and gamma) are 0.7959876, 0.1302668, 
and 0.0521223 respectively. These parameters control the weight given to past observations, the 
weight given to the trend component, and the weight given to the seasonal component 
respectively. The values have been optimized to produce the best fit to the data. 


The initial values for the level (1.0), trend (b.0), and seasonal component (s.0-s.9) are also 
provided in the output. The seasonal component has a period of 10, and the model uses an 
additive trend and seasonal component. 


Bpon [80]: # Get the residuals of the ARIMA model 
residuals exp = model _exp.resid 


Plot the residuals 

. figure (figsize=(10,5) ) 

-_plot (residuals exp, label='Residuals') 
.xlabel ("Year") 

.ylabel ("Residuals") 

.title ("Exponential Smooting Model Residuals") 
. Legend () 

. show () 


G T T 
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The model is used to generate forecasts for the population of 


Azerbaijan up to the end of 2031. 


BBon [64]: # Forecast the population of Azerbaijan up to 2032 


forecast = model exp.predict (start=test.index[0], end='2031-12-31') 


The actual population data, testing data, and forecast are plotted 


using matplotlib and displayed in a single figure. 


BBon [65]: # Plot the actual population data and the forecast 
plt.figure(figsize = [12,6]) 
plt.plot(train.index, train, label='Training Data', color='blue', linewi 
plt.plot(test.index, test, label='Testing Data', color='orange', linewid 
plt.plot(forecast.index, forecast, label='Forecast', color='green', 
plt.xlabel('Year', fontsize=14) 
plt.ylabel('Population', fontsize=14) 
plt.title('Azerbaijan Population Forecast', fontsize=18) 
plt.legend(fontsize=12) 
plt.grid (True) 
plt.tight layout () 
plt.show() 
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BBon [66]: forecast 


Out [66]: 2013-01-01 
2014-01-01 
2015-01-01 
2016-01-01 
2017 =0L=01 
2018-01-01 
2019-01-01 
2020-01-01 
2021-01-01 
2022-01-01 
2023-01-01 
2024-01-01 
2025-01-01 
2026-01-01 
2027-01-01 
2028-01-01 
2029-01-01 
2030-01-01 
2031-01-01 
2032-01-01 


O22: 
93.16. 
9479. 
9606. 
9722. 
9826. 
9933 
10038. 
10144. 
10234. 
10279. 
10383. 
10487. 
10613. 
10730. 
10834. 
10940. 
11046. 
11152. 
11242. 


Freq: AS-JAN, dtype: 
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325292 
224825 
451792 
017186 
603725 
964119 
086967 
444405 
961145 
582427 
960697 
860229 
087196 
652590 
239130 
599524 
722371 
079809 
596550 
217831 
float64 


Bsog [99]: dfl = pd.DataFrame (forecast) 


dfl.reset_index(inplace = True) 
= ['Year', 'Forecast'"] 


dfl.columns 
dfl.head() 


Gut [9S]: Year 


0 2013-01-01 
1 2014-01-01 
2015-01-01 
2016-01-01 


bk OO N 


2017-01-01 


Forecast 
9272.325292 
9376.224825 
9479.451792 
9606.017186 
9722.603725 
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BBpog [100]: plt.figure(figsize = [12,6]) 
sns.lineplot(x = 'Year', y = 'Forecast', data = dfl, color='blue') 
plt.xlabel ("Date") 
plt.ylabel ("Value") 
plt.title ("Exponential Smoothing Model Future Predictions") 
plt.xticks (rotation=90) 
plt.show () 
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forecast is a population prediction for Azerbaijan for the years in the test set, as 


well as for the years up to the end of 2031. The forecast was generated using the Exponential 
Smoothing method, which considers both the trend and the seasonality of the data. 


The actual population data, the testing data, and the forecast are displayed in a line plot, with the 


training data in blue, the testing data in orange, and the forecast in green. The plot shows how 
well the Exponential Smoothing model fits the training data and how well it predicts the testing 


data. 


It's important to interpret the result of the forecast in the context of the problem and data, 
considering factors such as the quality of the data, the choice of forecasting method, and the 
accuracy of the forecast. This code provides a basic example of time series forecasting, and it is 
up to the user to further validate and interpret the results based on their specific use case. 


THANK YOU FOR YOUR ATTENTION!!! 
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